876 research outputs found

    Formal and Informal Model Selection with Incomplete Data

    Full text link
    Model selection and assessment with incomplete data pose challenges in addition to the ones encountered with complete data. There are two main reasons for this. First, many models describe characteristics of the complete data, in spite of the fact that only an incomplete subset is observed. Direct comparison between model and data is then less than straightforward. Second, many commonly used models are more sensitive to assumptions than in the complete-data situation and some of their properties vanish when they are fitted to incomplete, unbalanced data. These and other issues are brought forward using two key examples, one of a continuous and one of a categorical nature. We argue that model assessment ought to consist of two parts: (i) assessment of a model's fit to the observed data and (ii) assessment of the sensitivity of inferences to unverifiable assumptions, that is, to how a model described the unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the asymptotic behavior of the contaminated sample mean

    Full text link
    An observation of a cumulative distribution function FF with finite variance is said to be contaminated according to the inflated variance model if it has a large probability of coming from the original target distribution FF, but a small probability of coming from a contaminating distribution that has the same mean and shape as FF, though a larger variance. It is well known that in the presence of data contamination, the ordinary sample mean looses many of its good properties, making it preferable to use more robust estimators. From a didactical point of view, it is insightful to see to what extent an intuitive estimator such as the sample mean becomes less favorable in a contaminated setting. In this paper, we investigate under which conditions the sample mean, based on a finite number of independent observations of FF which are contaminated according to the inflated variance model, is a valid estimator for the mean of FF. In particular, we examine to what extent this estimator is weakly consistent for the mean of FF and asymptotically normal. As classical central limit theory is generally inaccurate to cope with the asymptotic normality in this setting, we invoke more general approximate central limit theory as developed by Berckmoes, Lowen, and Van Casteren (2013). Our theoretical results are illustrated by a specific example and a simulation study.Comment: 14 pages, 1 figur

    Discussion of Likelihood Inference for Models with Unobservables: Another View

    Full text link
    Discussion of "Likelihood Inference for Models with Unobservables: Another View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]Comment: Published in at http://dx.doi.org/10.1214/09-STS277A the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The analysis of correlated non-Gaussian outcomes from clusters of size two: non-multilevel-based alternatives?

    Get PDF
    In this presentation we discuss the analysis of clustered binary or count data, when the cluster size is two. For Gaussian outcomes, linear mixed models taking into account the correlation within clusters, are frequently used and well understood. Here we explore the potential of generalized linear mixed models (GLMMs) for the analysis of non-Gaussian outcomes that are possibly negatively correlated. Several approximation techniques (Gaussian quadrature, Laplace approximation or linearization) that are available in standard software packages for these GLMMs are investigated. Despite the different modelling options related to these different techniques, none of these have satisfactory performance in estimating fixed effects when the within-cluster correlation is negative and/or the number of clusters is relatively small. In contrast, a generalized estimating equations (GEE) approach for the analysis of non-Gaussian data turns out to have an overall excellent performance. When using GEE the robust score and Wald test are recommended for small and large samples, respectively

    On the sample mean after a group sequential trial

    Full text link
    A popular setting in medical statistics is a group sequential trial with independent and identically distributed normal outcomes, in which interim analyses of the sum of the outcomes are performed. Based on a prescribed stopping rule, one decides after each interim analysis whether the trial is stopped or continued. Consequently, the actual length of the study is a random variable. It is reported in the literature that the interim analyses may cause bias if one uses the ordinary sample mean to estimate the location parameter. For a generic stopping rule, which contains many classical stopping rules as a special case, explicit formulas for the expected length of the trial, the bias, and the mean squared error (MSE) are provided. It is deduced that, for a fixed number of interim analyses, the bias and the MSE converge to zero if the first interim analysis is performed not too early. In addition, optimal rates for this convergence are provided. Furthermore, under a regularity condition, asymptotic normality in total variation distance for the sample mean is established. A conclusion for naive confidence intervals based on the sample mean is derived. It is also shown how the developed theory naturally fits in the broader framework of likelihood theory in a group sequential trial setting. A simulation study underpins the theoretical findings.Comment: 52 pages (supplementary data file included

    Evaluating Mode Effects in Mixed-Mode Survey Data Using Covariate Adjustment Models

    Get PDF
    Abstract The confounding of selection and measurement effects between different modes is a disadvantage of mixed-mode surveys. Solutions to this problem have been suggested in several studies. Most use adjusting covariates to control selection effects. Unfortunately, these covariates must meet strong assumptions, which are generally ignored. This article discusses these assumptions in greater detail and also provides an alternative model for solving the problem. This alternative uses adjusting covariates, explaining measurement effects instead of selection effects. The application of both models is illustrated by using data from a survey on opinions about surveys, which yields mode effects in line with expectations for the latter model, and mode effects contrary to expectations for the former model. However, the validity of these results depends entirely on the (ad hoc) covariates chosen. Research into better covariates might thus be a topic for future studies.</jats:p

    Generating Correlated and/or Overdispersed Count Data: A SAS Implementation

    Get PDF
    Analysis of longitudinal count data has, for long, been done using a generalized linear mixed model (GLMM), in its Poisson-normal version, to account for correlation by specifying normal random effects. Univariate counts are often handled with the negativebinomial (NEGBIN) model taking into account overdispersion by use of gamma random effects. Inherently though, longitudinal count data commonly exhibit both features of correlation and overdispersion simultaneously, necessitating analysis methodology that can account for both. The introduction of the combined model (CM) by Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs, Verbeke, Demétrio, and Vieira (2010) serves this purpose, not only for count data but for the general exponential family of distributions. Here, a Poisson model is specified as the parent distribution of the data with a normally distributed random effect at the subject or cluster level and/or a gamma distribution at observation level. The GLMM and NEGBIN model are special cases. Data can be simulated from (1) the general CM, with random effects, or, (2) its marginal version directly. This paper discusses an implementation of (1) in SAS software (SAS Inc. 2011). One needs to reflect on the mean of both the combined (hierarchical) and marginal models in order to generate correlated and/or overdispersed counts. A pre-specification of the desired marginal mean (in terms of covariates and marginal parameters), a marginal variance-covariance structure and the hierarchical mean (in terms of covariates and regression parameters) is required. The implied hierarchical parameters, the variance-covariance matrix of the random effects, and the variance-covariance matrix of the overdispersion part are then derived from which correlated Poisson data are generated. Sample calls of the SAS macro are presented as well as output

    Comments on: Missing data methods in longitudinal studies: a review

    Get PDF
    Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one hand, to a rich taxonomy of missing-data concepts, issues, and methods and, on the other hand, to a variety of data-analytic tools. Elements of taxonomy include: missing data patterns, mechanisms, and modeling frameworks; inferential paradigms; and sensitivity analysis frameworks. These are described in detail. A variety of concrete modeling devices is presented. To make matters concrete, two case studies are considered. The first one concerns quality of life among breast cancer patients, while the second one examines data from the Muscatine children’s obesity study

    A goodness-of-fit test for the random-effects distribution in mixed models

    Get PDF
    In this paper, we develop a simple diagnostic test for the random-effects distribution in mixed models. The test is based on the gradient function, a graphical tool proposed by Verbeke and Molenberghs to check the impact of assumptions about the random-effects distribution in mixed models on inferences. Inference is conducted through the bootstrap. The proposed test is easy to implement and applicable in a general class of mixed models. The operating characteristics of the test are evaluated in a simulation study, and the method is further illustrated using two real data analyses
    • …
    corecore